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This article describes experimental results obtained when a previously described 
fault diagnosis system was run on-line in real time at the 34-m beam waveguide 
antenna at DSS 13. Experimental conditions and the quality of results are described. 
A neural network model and a maximum-likelihood Gaussian classifier are compared 
with and without a Markov component to model temporal context . At the rate of a 
state update every 6.4 seconds , over a period of roughly 1 hour, the neural-Markov 
system had zero errors (incorrect state estimates^ while monitoring both faulty and 
normal operations. The overall results indicate that the neural-Markov combination 
is the most accurate model and has significant practical potential. 


I. Introduction 

In previous articles the problem of fault diagnosis of an- 
tenna pointing systems has been discussed in great detail 
[1,2]. Briefly, the problem is that of identifying whether 
an antenna pointing system is operating normally and, if 
not, which particular fault has occurred. The informa- 
tion available to the classifier system consists of time-series 
data from various sensor points in the antenna’s pointing 
servomechanism. Identifying a fault condition is rendered 
nontrivial by the fact that feedback, redundancy, nonlin- 
earities, and a considerable amount of noise from external 
disturbances are all present in the servo-control loop. This 
results in a masking of the underlying cause of a particular 
problem. 

In [2] a hybrid signal-processing/neural-network archi- 
tecture was applied to the problem. Specifically, various 


characteristic features were extracted from 4-second blocks 
of the multichannel data. The most useful features for 
discriminatory purposes were found to be autoregressive 
coefficients and standard deviations (estimated for certain 
channels). In [2] it was further shown that classification 
accuracies of about 90 percent (on test data sets indepen- 
dent of the training data) were achievable using a neural 
network model, where there were four classes to be pre- 
dicted: normal, tachometer failure, noisy tachometer, and 
compensation loss. By taking into account time corre- 
lations in the data, accuracies of about 98 percent were 
attainable. 

The results in [2] were obtained using off-line postpro- 
cessing of data collected from the elevation axis drives of 
the 34-m beam waveguide (BWG) antenna at DSS 13. 
Specifically, the time series sensor data were recorded di- 
rectly in digital format by the Lab View data acquisition 
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software running on a Macintosh II computer. At a sam- 
pling rate of 50 Hz with 12 channels of interest, this re- 
sulted in large volumes of data being recorded; hence, 
for practical reasons, only 5 minutes worth of data were 
recorded for each class on any given day of data acqui- 
sition, since even a 5-minute batch of such data typically 
requires about 0.4 Mbyte of storage memory. As described 
in [2], the 5-minute segments of data were then downloaded 
to a Sun workstation computer to be used for training and 
testing various fault diagnosis algorithms. This off-line 
analysis led to the development of specific pattern recog- 
nition algorithms described in [2]. 

In this article, an on-line experiment is described where 
a particular classification model was implemented as part 
of the data acquisition software, allowing direct on-line 
estimation and classification at the antenna itself. The 
primary goal of this experiment was to determine if the 
pattern recognition system described in [2] could be imple- 
mented on-line at low cost and provide reliable estimates 
of normal and fault conditions in real time. The article 
describes the experiment setup, the results obtained, and 
concludes that the system has significant practical poten- 
tial for antenna monitoring. 

II. Implementation of the On-Line Fault 
Diagnosis Software 

In [2] a fault diagnosis system was described which esti- 
mates various features from multichannel time-series data 
over consecutive windows, calculates a posterior probabil- 
ity estimate using these features, and then updates its esti- 
mate of the state of the system as a function of the current 
class estimates based on this window and the previous esti- 
mates. In [2] a number of different models were evaluated, 
having roughly comparable performance in terms of classi- 
fication accuracy. For this experiment the authors chose to 
implement one particular model using the LabView soft- 
ware package. The LabView software package is intended 
for real-time data acquisition and analysis. Its primary ad- 
vantage is that the system can be easily programmed and 
modified using an intuitive graphical user interface. Basic 
data acquisition capabilities, filtering, and signal analysis 
can be configured directly by editing a block diagram dis- 
play of the signal flow in the system. For the purposes of 
this experiment the primary functions to be implemented 
included analog to digital conversion and buffering of the 
multichannel sensor data, estimation of the classification 
features on a block-by-block basis, implementation of the 
classification equations which estimate class probabilities 
as a function of the features, and a memory-based algo- 
rithm which updates a context-based estimate of the sys- 
tem state given each block. 


A. Data Acquisition and Timing 

The LabView software was set up to acquire k consec- 
utive samples on each channel of interest where the data 
are sampled at a rate of f s . For the experiments described 
in this article f 8 = 54 Hz, which is sufficient given that 
all signals of direct interest are below 10 Hz. A block 
size of k = 200 was chosen providing a reasonable trade- 
off between being large enough to get reliable parameter 
estimates but not having too large of a delay between clas- 
sification decisions. Hence although the time series data 
sampling rate is 54 Hz, the classifier is operating at a much 
slower rate since a decision every few seconds is deemed 
quite adequate for this type of application. For simplic- 
ity of implementation the data were not pipelined in the 
LabView software, which meant that data were sampled 
and collected for the period of T\ seconds (where T\ = 
kj f s ) and then processed (parameter estimation, classifi- 
cation, and state update) for another r 2 seconds. After 
this total interval of r t = r\ + r 2 seconds, the next block 
of sample data was acquired. Hence, the state estimates 
were updated every r t seconds. In the actual experiments 
to be described later, r 2 was on the order of 2.7 seconds 
leading to a state update every 6.4 seconds. The “gaps” 
in the data of 2.7 seconds (between the 3.7-second blocks) 
were deemed inconsequential to the overall operation of 
the system, since no portion of the model relies on an as- 
sumption that the window blocks are contiguous in time. 
However, it should be noted that if this system were to be 
implemented in an operational mode, a pipelined scheme 
with no data gaps would be a more elegant solution. 

B. Feature Estimation 

The particular classification model chosen for the ex- 
periment uses 12 features. The first eight features are 
autoregressive-exogenous (ARX) model coefficients 
estimated from a model which has the antenna servo- 
controller elevation rate command as the exogenous (forc- 
ing) model input and the motor current of one of the 
elevation drive motors as the model output (see [2] for 
more details). These coefficients were estimated from the 
200-sample blocks using a standard least-squares estima- 
tion algorithm. The other four features used as input to 
the classifier are: estimated standard deviation of both 
elevation drive motor currents, estimated standard devia- 
tion of the average tachometer sensor, and the estimated 
standard deviation of the difference of the two tachometer 
sensors. 

C. Classification Using a Neural Network Model 

The particular classification model implemented was a 
multilayer neural network model: a feed-forward network 
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u(f) = A p(s(t - 1)) 


with a single hidden layer of nonlinear “hidden” units. 
The model had 12 inputs (one for each of the features) 
plus a constant bias input set to 1. The particular model 
used in the experiments had 20 hidden units. The output 
layer consisted of four units, one for each class: normal, 
tachometer failure, tachometer noise, and compensation 
loss. An output activation vector y(/) at time t is cal- 
culated as a function of the input feature vector x(t) as 
described in detail in Appendix A of [2]. In particular this 
model can be implemented via two matrix multiplications 
and a componentwise nonlinear vector transformation, i.e., 

y(f) = 7(w 2 • t(W, • x(t))) 

where Wi and W 2 are the weight matrices between the 
input and hidden layers and hidden to output layers, and 
t( u ) = (t(«i), • • • , 7 («n)) where 7(1) = l/(l+e~*). The 
ith component of the output vector y (t) is interpreted as a 
rough estimate of the posterior probability of class i given 
the input feature vector x(t). 

D. State Estimation With a Hidden Markov Model 

The class probability estimates as produced by the clas- 
sifier at each time t do not take into account the fact that 
faults are typically correlated over time. In [2] a heuris- 
tic scheme was described to reflect this prior information, 
where the current estimate of the system state was a func- 
tion of both the present probability estimates and past 
probability estimates over a specified number m of past 
windows. In Appendix A a formal model of this time 
dependence is introduced using a Hidden Markov Model 
(HMM). The HMM approach has been successfully used in 
applications such as speech recognition to model temporal 
context. The n states of the system (in this case the nor- 
mal and fault classes, n = 4) comprise the Markov model. 
For this application components of the Markov transition 
matrix A (of dimension n x n) are estimated subjectively 
rather than estimated from the data, since there is no re- 
liable database of fault- transition information available at 
the component level. The hidden component of the model 
arises from the fact that one cannot observe the states di- 
rectly, but only indirectly via a stochastic mapping from 
states to symptoms (the features). In speech recognition 
it is often desired to calculate the most likely sequence of 
states (perhaps phonemes) given some acoustic evidence. 
For a fault diagnosis application such as this it is sufficient 
to calculate at time t the probability that the system is in 
any particular state at that time. The state vector is de- 
noted by s(i). As described in Appendix A the probability 
estimate of state i at time t can be calculated recursively 
via 


and 


1(0) = ~r 

Z ^-=1 

where the estimates are initialized by a prior probability 
vector p(s(0)), the S{(t) are the components of s (t), 1 < 
i < n, and the yi(t) are the outputs of the classifier as 
described in Section II. C. 

E. Software Implementation 

Implementation of the ARX parameter estimation, 
classifier equations, and HMM model estimation was 
straightforward using predefined functions in Lab View. 
All of the above components of the algorithm were im- 
plemented, integrated with data acquisition software, and 
tested in the laboratory in about 3 working days. 

As mentioned earlier, the time taken to process a single 
block of 3.7 seconds worth of data was estimated at 2.7 sec- 
onds. Of this time, it was estimated that 2.3 seconds were 
spent calculating the eight ARX coefficients and the stan- 
dard deviation terms on the 200 samples, and 0.12 second 
were spent updating the screen displays; the actual classi- 
fication equations (the neural network model and Markov 
equations) took only 0.08 second, and another 0.20 second 
went to other miscellaneous bookkeeping activities. It is 
interesting to note that the classification process itself is 
quite fast taking only 2.9 percent of the processing time, 
while the feature extraction takes up 85 percent of the 
time. These numbers are significantly slower than speeds 
which would be attainable for operational purposes were 
the system to be implemented using special-purpose signal 
processing hardware. 



III. Experimental Conditions 

The experiments were carried out in early November 
1991 at the 34-m BWG antenna at DSS 13. Wind condi- 
tions were favourable for the duration of the experiments, 
i.e., no high wind conditions were encountered. Under each 
desired state (normal and fault classes) the antenna was 
driven at a typical DSN tracking rate (between 1 and 3 mil- 
lidegrees per second) along the elevation axis. Normal con- 
ditions corresponded to normal antenna operation. Fault 
conditions were simulated as described previously in [2], 
i.e., hardware faults were switched into the feedback loop 
of the elevation servo-control loop in a controlled man- 
ner. For safety reasons these faults were introduced when 


98 


the antenna was stopped rather than while it was mov- 
ing (this precluded testing the response of the models in 
terms of transitions between faults). Each fault condition 
was then monitored after the antenna had been restarted. 
Hence, the experimental results in the next section reflect 
the fact that the data were collected in different batches in 
this manner. Faults were typically monitored for durations 
of 10 to 15 minutes and the total duration of data to be 
reported in the next section amounted to a total of about 
1 hour of antenna monitoring. Longer duration monitor- 
ing was not practical due to various constraints such as 
other demands on the antenna and the amount of setup 
time required to get the monitoring system in place. Due 
to hardware problems it was not possible to simulate one 
of the faults described in [2], namely the tachometer noise 
condition. Hence, the results pertain to normal conditions, 
the tachometer failure, and compensation loss faults. 


IV. Experimental Results 

For purposes of comparison, the model implemented at 
DSS 13 is compared with a simple maximum-likelihood 
Gaussian classifier whose operation is described in Ap- 
pendix B. The results with the Gaussian classifier were 
generated off-line using the recorded feature data. In ad- 
dition, data were recorded with and without the Markov 
portion of the model to evaluate the effectiveness of this 
component. Hence, there are four types of models being 
compared: Gaussian, Gaussian followed by Markov, neural 
network, and neural network followed by Markov. Again 
it is emphasized that the results to be described pertain- 
ing to the latter two models were generated on-line in real 
time. 

Table 1 summarizes the overall classification perfor- 
mance of each of the models on five different runs: two 
for normal conditions, two for the compensation loss fault, 
and one run for the tachometer failure fault. The units in 
the table correspond to windows of 3.7 seconds worth of 
sensor data spaced at 6.4-second intervals. The bottom 
row of the table tabulates the number of windows per run: 
hence, for example, 596 windows in total were analyzed, 
corresponding to about 1 hour and 3.6 minutes total dura- 
tion. The numbers in the other four rows of the table (one 
row per model) indicate the number of windows misclas- 
sified by each model for each run, except for the final two 
columns, which give the overall number of misclassifica- 
tions per model and the percentage misclassified, respec- 
tively. Clearly, from the final column, the neural-Markov 
model (implemented in real time) is the best model in the 
sense that no windows at all were misclassified. It is signifi- 
cantly better than the Gaussian classifier, which performed 


particularly poorly under fault conditions. However, un- 
der normal conditions it was quite accurate, having only 
one false alarm during the roughly 30 minutes of time de- 
voted to monitoring normal conditions. The effect of the 
Markov model is clearly seen to have beneficial effects, 
in particular reducing the effects of isolated random er- 
rors. However, for the compensation loss on day 316, the 
Markov model actually worsened the already poor Gaus- 
sian model results, which is to be expected if the non- 
Markov component is doing particularly poorly as in this 
case. 

Figures 1 through 5 plot the estimated probability of 
the true class as a function of time for various models to 
allow a more detailed interpretation of the results. Note 
that, given that the true class is labelled i, the estimated 
probability of class i from the neural network corresponds 
to the normalized output of output unit i of the network 
at time t , i.e., 


3 

while the Markov probabilities correspond to the estimates 
of state i, p(s;(tf)), as described earlier. Figure 1 corre- 
sponds to normal conditions on day 311 and compares the 
neural model with and without the Markov processing. 
Figure 1 demonstrates that the instantaneous probability 
estimates from the neural model have a large variation 
over time and are quite noisy. This is essentially due to 
the variation in the sensor data from one window to the 
next, since, as might be expected, signals such as motor 
current are quite noisy. In addition, a large glitch is visi- 
ble at around 460 seconds. The neural model gives a low 
probability that the condition is normal for that particu- 
lar window (in fact a large glitch such as this looks like a 
tachometer failure problem); however, the Markov model 
remains relatively unaffected by this single error. Over- 
all, the stability of the Markov model is clearly reflected 
in this plot and should be advantageous in an operational 
environment in terms of keeping the false alarm rate to a 
minimum. Figure 2 is the same as Fig. 1 except that the 
data were obtained on day 316. Note that in both Figs. 1 
and 2, at any particular instant the classifier assigns to 
the true class a probability of up to only 0.8 or 0.9. In 
contrast, by modelling the temporal context, the Markov 
model assigns a much greater degree of certainty to the 
true class. 

Figures 3 and 4 compare the performance of the Gaus- 
sian and neural models on detecting the compensation loss 
fault. In Fig. 3, the variation in the Gaussian estimates 
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is quite marked. The Gaussian-Markov model combina- 
tion, after some initial uncertainty for the first 90 seconds, 
settles down to yield reasonable estimates. However, the 
overall superiority of the neural-Markov model is evident. 
In Fig. 4 (the same fault on a different day), the neural 
and Gaussian models are compared directly. The varia- 
tion in the Gaussian estimates is even worse. In fact, the 
resulting Gaussian-Markov estimates are almost random 
in nature and have been omitted for clarity. 

Figure 5 also compares the Gaussian and neural models 
but on the tachometer failure fault. Once again the vari- 
ation over time of the Gaussian estimates is unacceptably 
large, and again, the resulting Gaussian-Markov estimates 
have not been plotted due to their almost random nature, 
i.e., the variation over time on the estimates is so great 
that the Markov model cannot find any significant corre- 
lation. 


In Fig. 6, the same information as that for Fig. 3 is 
plotted but in a different manner. Let p be the probability 
estimate of the true class. Then 1 — p is the effective error 
in the estimate; Fig. 6 is a logarithmic plot of this error 
term. Here, the overall dominance of the neural-Markov 
model is once again evident, as is the large variation in the 
Gaussian model. 


V. Conclusion 

The DSS 13 field results discussed in this article provide 
convincing evidence for the potential of a pattern recog- 
nition system to reliably monitor a DSN antenna drive 
system in real time. In addition, it should be noted that 
the monitoring system is unobtrusive and relatively easy to 
install due its use of inexpensive “off-the-shelf 5 PC-based 
hardware and software components. 
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Table 1. Summary of classification results for various models and runs during days 311 and 316 at DSS 13. 




Number of blocks misclassified 


Overall misclassification 

Model 


Day 311 


Day 316 


statistics 


Normal 

Compensation 

loss 

Normal 

Tachometer 

failure 

Compensation 

loss 

Total no. of 
misclassi ficat i ons 

Total percent of 
misclassified 

Gaussian 

1 

15 

0 

35 

50 

101 

16.94 

Gaussian-f Markov 

1 

8 

0 

3 

74 

86 

14.42 

Neural 

1 

0 

4 

0 

0 

5 

0.84 

N eural + M arkov 

0 

0 

0 

0 

0 

0 

0.00 

Total number 
of windows 

128 

60 

152 

126 

130 

596 
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Fig. 5. Comparison of Gaussian and neural models: estimate of probability of true class 
(tachometer failure) as a function of time for day 316. 
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GAUSSIAN 
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Fig. 6. Comparison of Gauss! an-Markov and neural-Markov models: plot of log (1 — p) t where 
p is an estimate of the probability of the true class (compensation loss) as a function of time 
for day 311. 






Appendix A 
Hidden Markov Models 


Assume that time t is discretized and the system of 
interest is always in one of n states. Further assume the 
system of interest is described by a Markov model, i.e., at 
each time t the probability that the system is in any state 
*.p(*(0). is only a function of the state j that the system 
was in at time t — 1 . Strictly speaking, this is a, first-order 
Markov model. Given that the system was in state j at 
time t — 1 , the probability a,j = p(si(t)\sj(t — 1)) is the 
state transition probability and the n x n matrix A with 
components a,^* characterizes the Markov model. 

In practice, however, the states may not be directly 
observable, giving rise to the notion of a hidden Markov 
model. Instead, each state produces an observable set 
of symptoms or features x(t) at each time t. The fea- 
tures x(t) are not a deterministic function of the state, 
rather they are described by a probability density func- 


tion p(x(t)\si(t)) . Note that the probability of obtaining 
a particular x(<) depends on which state i the system is 
in at time t. It is this fact that makes it possible to infer 
the probability that the system is in any state i simply by 
observing the features x(<). 

This model describes quite closely the fault diagnosis 
problem, namely the system (the antenna elevation- axis 
servo-control loop) is in some unknown state (normal and 
failure modes) at each time instant that the classifier looks 
at the sensor data. The neural network classifier learns 
the instantaneous relationship between states and features, 
i.e., it provides an estimate of p(si(t)\x(t)) . However, the 
best estimate of the state at time t uses all of the informa- 
tion available up to time 2, namely the observed feature 
vectors {x(J), ••• , x(l)}. Hence, the best estimate of state 
i requires the calculation of p(sj(<)|{x(t), •**, x(l)}), 
which is now derived: 


P(*(0IW0> •••> *U)}) 


p(s,(<),{x(<), •••, x(l)}) 
p({x(<), •••, x(l)}) 


P(X) 


Oi(t) 


where X = {x(t), • • • , x(l)} and <*,(<) = p(s,(t), {x(t), ■■■, x(l)}) 

= Ep(*<W. x W. •••- 

= ^J^P( S i( t )>x( t )l x ( i ~ 1 )> ■■■> x(l),Sj(t- 1)) 

X p(x(t - 1), •••, x(l),Sj(< - 1)) 

= 2pM0»*(0M* - UM* - !) 

by the Markov assumption and the definition of orj 


= ]>Jp(x(t)ls<(t))p(si(t)lsj(t - 1 ))atj(t - 1) 
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since x(t) is assumed independent of past states 


by Bayes' rule and the definition of a tJ - 


i v-' pM0W0)p(*(0) 

„<«.! i\\ a ij a j(t — i) 


P(X)£ p(s t (t)) 


p( x (0) MO /. _ ,v 
p(X) fcpiMt)) ,J j( } 


where pi(t) is the instantaneous probability estimate of 
state i given x(t) (namely, the normalized output of the 
neural network in this case), and p ($,(*)) is the prior prob- 
ability of state i. Because (by definition) 


1 

P(X) 


5>w = 1 

*=i 


p(x(0)/p(X) can be treated as a constant and factored 
out. Hence, by virtue of the recursion relation defined 


on the Qf’s above and given p(s(0)), the probability of 
each state at time t can be computed recursively from ear- 
lier state information. These basic recursion relations are 
known as the forward-backward relations in the literature. 
For a more extensive discussion on hidden Markov models 
and their applications, the reader is referred to [3]. 


For the experiments described in this article, the initial 
states p(«,*(0)) were set equal to the priors, which in turn 
were taken to be 1 /n = 0.25. The were set to 0.99 for 
i = j and 0.01/3 for i / j. 
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Appendix B 

A Maximum-Likelihood Gaussian Classifier 


Consider that there are n classes 1 < i < n. In 
turn, the features are described by a d-component feature 
vector x. For a Gaussian classifier, one assumes that the 
probability density p(x|u;;) is multivariate normal for each 
class, i.e., 


p(xk) 


1 (-l/2)(x-/i l ) , S- 1 (x-Mi) 

(27T) d /2|E,| 1 /2 


where /ii is a d-component mean vector and E* is a d x d 
covariance matrix. 

By Bayes’ rule, 


p(w, |x) oc p(x|w<)p(w<) 


where p(w,-) is the prior probability of class i. Given a par- 
ticular feature vector x, one calculates ji = p(x|w;)p(wi), 
from which one gets 


PM X ) = 

2^j=\ ij 

which is the posterior estimate of the probability of class 
i given the feature data. For the results reported in this 
article, E* was assumed to be diagonal since there were not 
sufficient data to obtain reliable estimates of all the ele- 
ments of the full d x d (d = 12) covariance matrix for each 
class. The diagonal variance terms <r,- and the means in 
each dimension were estimated directly from the datajus- 
ing maximum likelihood estimates. The priors p(w{) were 
chosen to be 1/n = 0.25 for each class. 
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